Closed Bug 1724242 Opened 4 years ago Closed 4 years ago

Background update applied when mixing X11/Wayland and opening remote link

Categories

(Toolkit :: Application Update, defect)

Desktop
Linux
defect

Tracking

()

RESOLVED FIXED
94 Branch
Tracking Status
firefox94 --- fixed

People

(Reporter: gerard-majax, Assigned: stransky)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(1 file)

I have recently switched to forcing Wayland on my system (Ubuntu/21.10) while previously Nightly was running over XWayland.

Since then, I have been unable to have links opening from third-party apps (IRC, Thunderbird, etc.) ; it always ends up with the error message stating that Firefox is already running.

STR:

  1. MOZ_ENABLE_WAYLAND=1 firefox
  2. Click on a link in Thunderbird

Expected:
A new tab open

Actual:
Error stating Firefox is already running and is unresponsive.

It seems that a side effect of this is exposing bug 1480452 and making me seeing about:restartrequired. Digging on that matter shows that when it is displayed, the platformBuildID reported in about:support is actually different from the platform.ini one on disk, so we are not in the case of a false-positive mismatch. From discussion on Matrix,

agashlin> What you're describing seems consistent with this:
    An update is ready, Firefox is running.
    You try to open a URL from some other program. This starts Firefox, normally that will just pass the command line to the older instance very early and exit (remoting)
    For some reason, the newly launched Firefox can't find or communicate with the older one in order to remote the command line, so it proceeds as if you'd asked it to start a new instance (e.g. with --no-remote)
    The new instance applies the update and restarts itself (and again fails to remote)
    The new instance tries to open the profile, which is already in use, so you see the "Firefox is already running" message.

My recent experiments seems to corroborate this: I have managed to avoid clicking on any link and while Nightly was showing the information of updates ready to be applied (and I think at least two have been shipped), I never got into the case of mismatching platformBuildID. Then I purposedly clicked on a link, waited for the unresponsive error message to pop, and I could verify after that on-disk platformBuildID did changed.

Blocks: wayland
Component: General → Graphics
Product: Toolkit → Core

Martin, I'm wondering if it might not be dupe of bug 1645038 nor bug 1634096. In both case, it seems they are stalling and people have a hard time repro'ing. It is 100% repro in my case, so I'd be happy to help there.

Flags: needinfo?(stransky)

(In reply to Robert Mader [:rmader] from comment #1)

I think https://mastransky.wordpress.com/2020/03/16/wayland-x11-how-to-run-firefox-in-mixed-environment/ should help here.

It might, but:

  • it's invasive, I dislike that (but so far I don't really see how we could fix that).
  • to the best of my knowledge, https://bugzilla.mozilla.org/show_bug.cgi?id=1480452#c8 does not make any obvious link between being unable to find the Firefox instance and about:restartrequired being shown "too much".

(In reply to Robert Mader [:rmader] from comment #1)

I think https://mastransky.wordpress.com/2020/03/16/wayland-x11-how-to-run-firefox-in-mixed-environment/ should help here.

Forcing MOZ_DBUS_REMOTE=1 seems to indeed to the trick.

(In reply to Alexandre LISSY :gerard-majax from comment #4)

Forcing MOZ_DBUS_REMOTE=1 seems to indeed to the trick.

I wonder if we can make that the default, at least when running on Xwayland (or if we detect that dbus is availabe, or whatever the reason is why it's not used by default).

I guess Martin might know more why it's not the case ?

I've got some update pending, I can see it under the updates/0/ and so far after opening links from different apps, no about:restartrequired yet.

I don't think this is related to updates. I see it every time when clicking on a link in a non-Wayland app (e.g. Signal) when running Wayland Firefox.

It might be related to the WMClass stuff -- I had to tweak StartupWMClass to fix start-up notifications.

(In reply to Laurențiu Nicola from comment #8)

I don't think this is related to updates. I see it every time when clicking on a link in a non-Wayland app (e.g. Signal) when running Wayland Firefox.

Please read carefully the first comment. It's not directly related to updates, but it can impact them.

Severity: -- → S3

When (In reply to Robert Mader [:rmader] from comment #5)

(In reply to Alexandre LISSY :gerard-majax from comment #4)

Forcing MOZ_DBUS_REMOTE=1 seems to indeed to the trick.

I wonder if we can make that the default, at least when running on Xwayland (or if we detect that dbus is availabe, or whatever the reason is why it's not used by default).

That's Bug 1677462.

Flags: needinfo?(stransky)
Component: Graphics → Widget: Gtk
Priority: -- → P3

I can reproduce this issue, and I have a workaround.

I originally saw this when I tried testing Firefox on Wayland by using MOZ_ENABLE_WAYLAND=1 firefox. If I clicked a link from another application, I'd get the same dialog and the link wouldn't open.

When I switched to setting MOZ_ENABLE_WAYLAND=1 in my user environment for all programs, this no longer happened.

I could reproduce it again by unsetting MOZ_ENABLE_WAYLAND and then running firefox https://example.org; that produced the same dialog again.

I can reproduce this issue, and I have a workaround.

That works with Thunderbird, but it still happens to me when I click links in non-Wayland apps like Signal. I did set MOZ_ENABLE_WAYLAND globally (in ~/.config/environment.d on my DE).

The root cause is easy now that I found it: https://searchfox.org/mozilla-central/rev/49b6e60550243b4b4d71d6ab35a3ff2b9a9f7c69/toolkit/xre/nsAppRunner.cpp#4553-4615

This is where we will try to process pending updates. Right after that, we try to lock the profile.

So, what happens is:

And from there, we are doomed:

  • Update has been applied in the background of the running session,
  • Next time a process needs to be created, we will hit platformBuildID mismatch and so actively present a about:restartrequired to the user
Component: Widget: Gtk → Application Update
OS: Unspecified → Linux
Product: Core → Toolkit
Hardware: Unspecified → Desktop
Summary: Running with wayland breaks when trying to open link from other app → Background update applied when mixing X11/Wayland and opening remote link
Version: unspecified → Trunk

I'm not sure how much we should consider this is just another case of the first instanciation mechanism described in https://bugzilla.mozilla.org/show_bug.cgi?id=1480452#c8 ?

I'll let you decide whether we should keep this bug open to track this specific case or whether we can just add more infos to your existing description of the issue.

Flags: needinfo?(ksteuber)

I have updates disabled (system-wide install on Linux) and I still get a "Firefox is already running, but is not responding" error. Since you updated the issue title to make it about updates, should I file another one?

The product::component has been changed since the backlog priority was decided, so we're resetting it.
For more information, please visit auto_nag documentation.

Priority: P3 → --

(In reply to Laurențiu Nicola from comment #15)

I have updates disabled (system-wide install on Linux) and I still get a "Firefox is already running, but is not responding" error. Since you updated the issue title to make it about updates, should I file another one?

If you are referring to the fact that the remote instance is not found when mixing X11/Wayland, it's already filed, as Martin said in comment 10.

I can confirm that MOZ_DBUS_REMOTE=1 fixes it, but the linked issue is WONTFIX.

(In reply to Alexandre LISSY :gerard-majax from comment #0)

STR:

  1. MOZ_ENABLE_WAYLAND=1 firefox
  2. Click on a link in Thunderbird

I guess this is a STR too:

  • MOZ_ENABLE_WAYLAND=1 firefox
  • firefox $url

So the question is should firefox try both remote methods? Actually, can we switch to dbus by default even on X11, and fallback to XRemote when that's not possible?

Flags: needinfo?(stransky)

(In reply to Alexandre LISSY :gerard-majax from comment #14)

I'm not sure how much we should consider this is just another case of the first instanciation mechanism described in https://bugzilla.mozilla.org/show_bug.cgi?id=1480452#c8 ?

Hmm, well the about:restartrequired issue, as you have noted, is pretty clearly a result of Bug 1480452. It doesn't seem like this issue particularly affects the way that that bug needs to be fixed, though it does seem to represent a different way to end up in that situation.

As a bit of a side note, I think that I've got some time coming up to dig into that bug. It's going to be a huge job though, and I don't have a lot of resources to tackle it with. So I can't really promise any sort of timeline on making progress.

Even assuming, however, that we got Bug 1480452 fixed, it seems to me that it would mitigate this issue but not fix it. That is to say, you wouldn't see the about:restartrequired page, but this original issue would, I believe, remain:

Expected:
A new tab open

Actual:
Error stating Firefox is already running and is unresponsive.

From what I have read in this bug, it sounds like the underlying issue here is this:

xdg-open is called in a way where it will try to find a remote instance for Firefox but will ultimately fail because of those differences of windowing systems ;

It seems like fixing that would solve both problems. If remoting worked properly in this context, opening the link would result in Firefox opening, remoting into the existing instance of Firefox, and exiting before it had a chance to install updates.

I will, of course, continue to attempt to make progress on Bug 1480452. But I recommend that this issue be solved properly by fixing remoting rather than waiting for a mitigation that is still a long way off. It sounds like the hope was that Bug 1677462 would provide this. But given that it has been marked WONTFIX, perhaps something else should be pursued. I don't really have any suggestions there; that isn't really my area of expertise.

I'll let you decide whether we should keep this bug open to track this specific case or whether we can just add more infos to your existing description of the issue.

Given that this represents a new activation mechanism for Bug 1480452, I think it makes sense to have a bug open for it, like we have for Bug 1705217.

Flags: needinfo?(ksteuber)

(In reply to Mike Hommey [:glandium] from comment #19)

(In reply to Alexandre LISSY :gerard-majax from comment #0)

STR:

  1. MOZ_ENABLE_WAYLAND=1 firefox
  2. Click on a link in Thunderbird

I guess this is a STR too:

  • MOZ_ENABLE_WAYLAND=1 firefox
  • firefox $url

So the question is should firefox try both remote methods? Actually, can we switch to dbus by default even on X11, and fallback to XRemote when that's not possible?

It depends what you mean by 'that's not possible'. That may mean:

  • fallback to X11 when DBus session interface is missing or DBus support is not built in.
  • fallback to X11 when there isn't any Firefox listening on DBus interface, so we try X11.

The second case slows down Firefox start as you need to start and query two remotes. For instance Fedora uses DBus only on both X11 and Wayland as we don't want to slow down the start by testing various remotes.

Also I can't imagine a scenario where we want to fallback to X11 when we don't find an active DBus client - opening release instance (X11) from nightly (DBus) doesn't look correct to me.

IMHO the best solution may be https://phabricator.services.mozilla.com/D97146 - use DBus if there's possibility we're running on Wayland or use DBus always when Firefox is built with --enable-dbus (I don't have any preference here).

Flags: needinfo?(stransky)
Assignee: nobody → stransky
Status: NEW → ASSIGNED

I would just like to mention that should it be possible to detect this situation, we could also make ShouldProcessUpdates avoid this issue.

(In reply to Nick Alexander :nalexander [he/him] from comment #24)

I would just like to mention that should it be possible to detect this situation, we could also make ShouldProcessUpdates avoid this issue.

From what I recall of some matrix discussion after comment 13, changing the behavior of ShouldProcessUpdates was not the favorite option here.

Pushed by stransky@redhat.com: https://hg.mozilla.org/integration/autoland/rev/7cbf3890fe9c [Linux] Use DBus remote when Firefox is built with --enable-dbus, r=glandium

Backed out for causing build bustages.

Flags: needinfo?(stransky)
Pushed by stransky@redhat.com: https://hg.mozilla.org/integration/autoland/rev/117f94e1e765 [Linux] Use DBus remote when Firefox is built with --enable-dbus, r=glandium
Pushed by stransky@redhat.com: https://hg.mozilla.org/integration/autoland/rev/ef22d8cbf4ef [Linux] Use DBus remote when Firefox is built with --enable-dbus, r=glandium
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 94 Branch
Depends on: 1739919
Duplicate of this bug: 1645038
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: